# Native Multimodal Pretraining

Internvl3 38B Instruct GGUF
Apache-2.0
InternVL3-38B-Instruct is an advanced Multimodal Large Language Model (MLLM) that demonstrates exceptional overall performance, with strong multimodal perception and reasoning capabilities.
Image-to-Text Transformers
I
unsloth
1,236
2
Internvl3 8B GGUF
Apache-2.0
InternVL3 is an advanced multimodal large language model series, demonstrating exceptional overall performance with robust multimodal perception and reasoning capabilities.
Image-to-Text Transformers
I
unsloth
4,810
3
Internvl3 9B AWQ
MIT
InternVL3-9B is a multimodal large language model from the InternVL3 series, featuring exceptional multimodal perception and reasoning capabilities. It supports various application scenarios such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.
Text-to-Image Transformers Other
I
OpenGVLab
214
1
Internvl3 8B AWQ
Other
InternVL3-8B is an advanced multimodal large language model developed by OpenGVLab, featuring powerful multimodal perception and reasoning capabilities, supporting tool calling, GUI agents, industrial image analysis, 3D visual perception, and other emerging fields.
Image-to-Text Transformers Other
I
OpenGVLab
1,441
3
Internvl3 2B AWQ
Other
InternVL3-2B is an advanced Multimodal Large Language Model (MLLM) developed by OpenGVLab, featuring exceptional multimodal perception and reasoning capabilities, supporting tool usage, GUI agents, industrial image analysis, 3D visual perception, and more.
Transformers Other
I
OpenGVLab
677
1
Internvl3 1B AWQ
Other
InternVL3-1B is a multimodal large language model in the InternVL3 series, featuring exceptional multimodal perception and reasoning capabilities.
Text-to-Image Transformers Other
I
OpenGVLab
303
1
Internvl3 2B Pretrained
Apache-2.0
InternVL3-2B is an advanced multimodal large language model developed by OpenGVLab, featuring robust visual-language understanding and reasoning capabilities, supporting various multimodal tasks.
Text-to-Image Transformers Other
I
OpenGVLab
61
1
Internvl3 9B Instruct
MIT
InternVL3-9B-Instruct is the supervised fine-tuned version of the InternVL3 series, featuring powerful multimodal perception and reasoning capabilities, supporting various modalities such as images, text, and videos.
Image-to-Text Transformers Other
I
OpenGVLab
220
2
Internvl3 8B Instruct
Other
InternVL3-8B-Instruct is an advanced Multimodal Large Language Model (MLLM) that demonstrates exceptional multimodal perception and reasoning capabilities, supporting various functionalities such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.
Image-to-Text Transformers Other
I
OpenGVLab
885
2
Internvl3 2B Instruct
Apache-2.0
InternVL3-2B-Instruct is a supervised fine-tuned version based on InternVL3-2B, undergoing native multimodal pretraining and SFT processing, equipped with powerful multimodal perception and reasoning capabilities.
Text-to-Image Transformers Other
I
OpenGVLab
1,345
4
Internvl3 1B Instruct
Apache-2.0
InternVL3-1B-Instruct is the supervised fine-tuned version of the InternVL3 series, based on native multimodal pretraining, with exceptional multimodal perception and reasoning capabilities.
Image-to-Text Transformers Other
I
OpenGVLab
705
5
Internvl3 78B Instruct
Other
InternVL3-78B-Instruct is an advanced multimodal large language model developed by OpenGVLab, demonstrating exceptional multimodal perception and reasoning capabilities, supporting various tasks such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.
Image-to-Text Transformers Other
I
OpenGVLab
345
5
Internvl3 1B
Other
InternVL3-1B is a 1B-parameter multimodal large language model in the InternVL3 series, integrating the InternViT visual encoder and Qwen2.5 language model, with exceptional multimodal perception and reasoning capabilities.
Transformers Other
I
FriendliAI
71
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase